| Title | Galaxy |
|---|---|
| Training dataset: | None |
| Questions: |
|
| Objectives: |
|
| Estimated time: | 1h 15 min |
When we have to do a bioinformatic analysis using a reference genome, we need to provide just one reference file. The problem with segmented genomes, such as Crimea Congo's, is that we have one different file for each fragment in the databases. So here we are going to learn how to load the different segments of a genome in Galaxy and concatenate them in order to create a unique fasta file that can be used for further analyses. Also, we are going to learn how to count the number of sequences in a multifasta file, and the number of nucleotides in each sequence in a fasta file.
First of all go to Galaxy Web Server in Europe and you will se a display such as this one:
Where you have 4 different elements: 1. The first one in yellow is the Title panel with the buttons: - Home (house): To go to the home page in Spanish - Workflows: To go to the workflow manager - Visualize: Displays the visualization manager and options - Share Data: Displays the sharing options - Help: Displays all the help menu available - Login or Register - Galaxy Training Materials (graduation cap): Displays de Galaxy Trainings list - Enable/Disable scratchbook (9 squares) 2. The left side panel in blue with all the tools in this Galaxy mirror 3. Central panel in red, which will let you run analyses and view outputs 4. Right panel in green, with the history record.
The first thing we would do is to sign up, so you can save your history. To do that, you should follow the next steps: 1. Select Login or Register in the header panel 2. Select Register here. 3. Fill in the registration information. :warning: Use an email you can access now, because it will ask you to confirm your e-mail adress. 4. Log into your e-mail, and verify your Galaxy account. 5. Log in with your credentials.
Now select the Home button and return to the home page. We are going to learn how to manage the history, which is in the right panel. To do this, we will follow these steps:
Now we are going to load the data. In this case we are going to use the Crimea Congo reference genome. Crimea Congo's genome is composed of 3 segments, each with its own code:
In order to load these fragments in Galaxy we have to follow these steps: 1. In the left side panel, select Upload Data 2. In the new panel select Paste/Fetch Data 3. Then copy the following block of text:
https://raw.githubusercontent.com/BU-ISCIII/galaxy_virologist_training/one_week_4day_format/exercises/data/S_DQ133507.fasta
https://raw.githubusercontent.com/BU-ISCIII/galaxy_virologist_training/one_week_4day_format/exercises/data/M_EU037902.fasta
https://raw.githubusercontent.com/BU-ISCIII/galaxy_virologist_training/one_week_4day_format/exercises/data/L_EU044832.fasta
With this, our data is loading into Galaxy. You can see that each job is given a different number, so you can keep track of the order of your jobs with it.
The jobs can have three different states: 1. Waiting: Your jobs will have a grey color and a clock on their left side. In this state your jobs are waiting to enter in the Galaxy server. 2. Running: Your jobs will have an orange color and rotatory dots on their left side. In this state your jobs are running in the Galaxy server. 3. Done: Your jobs will have a green color. Your data is ready to be used.
Now we can start using our data. First of all, we are going to see how these fasta files look like. There are different ways to do this: 1. Select the :eye: icon in the right to the file name. For the first time, our center panel has changed, and now it displays the content inside the fasta file.
When we display this file summary, we obtain additional options to process this file:
Note: If you select again in the file name, the summary disappears
Now we are going to rename all the fasta files we uploaded to Galaxy. To do this, we have to click in the pencil icon that appears next to each file name. This will display a new central window with the different edition options for each file:
This screen allows you to perform different things. Starting from the right:
:warning: Select Save button to save the changes.
We are going to rename the files as shown here:
Now we are going to use the fasta files uploaded to Galaxy to run tools. To run tools we have to:
When we select the tool we are going to see the tool's options in the center panel. We are going to see different information about the tool we want to run. :warning: These options are tool specific. This means each tool has its own options. 1. Tool name, version and options to save and share the tool 2. The input dataset options: - We can select data from the history - Upload data from a collection - Upload a dataset (the upload dataset pop up will appear) - Brows a dataset (you can brows dataset from the history) 3. Insert new dataset blocks (no need in our case) 4. Execute button 5. Tool information: - :warning: - What it does - Examples - Citaiton
To concatenate the samples, we will follow the wollowing steps: 1. In Datasets to concatenate: - Press Ctrl key in your keyboard - Select the three fasta files while still pressing the Ctrl key. 2. Press execute
Once we have pressed Execute, a new central panel window will appear and our job will be in queue process: 1. In the top of the panel (blue) you have a summary of what we've just run. In our case 3 input datasets have are involved in a single process, with a unique output. 2. In the foot of the panel (red) you have some recommendations from Galaxy on how to process your data after the process we have just run. 3. In the history (yellow) we have now a new entry, which is the number 4, with the results of our job.
Whenever our job is green, we can see the results by clicking in the :eye: icon. Now we can see the three sequences for the segments, headers included, in a unique fasta file.
Now we are going to rename the fasta file as follows: 1. Click on the :pencil: icon 2. Write Crimea Congo Ref Genome in the Name square 3. Press Save
First Question Answer
How do I create a fasta reference for fragmented Crimea Congo genome?
By concatenating the different fragments of the genome
Now that we have our concatenated fasta file, we can check that everything is fine by scrolling down the genome, and checking that the three fragments are fine, or we can use another tool to count the number of sequences in a fasta file, and the number of nucleotides in each sequence.
To do this, we are going to: 1. Search fasta in the tool square. 2. Select Fasta Statistics Display summary statistics for a fasta file 3. In fasta or multifasta file select multiple data set 4. With Ctrl key pressed, select the 3 fragments and the multifasta file 5. Press Start button.
Now we have 4 jobs running, because this tool will run one statistics process for each fasta file we selected.
Now we are going to se the statistics summary for each fasta file. To do this we have to select the :eye: icon in each of the Fasta Statistics output.
For the S fragment, we are going to see the number of sequences inside the fasta file, and the number of nucleotides. We are going to:
Now we are going to repeat this process for the rest of the fasta files:
M fragment
How many nucleotides are in M fragment?
5364 nt
L fragment
How many nucleotides are in L fragment?
12150 nt
Crimea Congo Genome
How many sequences and nucleotides are in the Crimea Congo reference genome?
3 sequences (3 fragments)
19187 nt
Now we can answer the second question.
Second Question Answer
How many nucleotides has each fragment of Crimea Congo genome?
1673 the S fragment
5364 the M fragment
12150 the L fragment
Now that we know that the reference genome for the whole Crimea Congo virus is done correctly, we can use it as reference genome for further analysis in this same history, or save it to use it in our computer. To do so: 1. Select the name of the fasta you want to download: 4: Crimea Congo Ref Genome 2. Select the Save button in the emerging panel.